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ABSTRACT 

Results of research conducted to ascertain the effect 
on test grades of changing answer choices are presented. The main 
questions that were examined were: (1) Does the changing of responses 

to test items (presumably based upon item reconsideration) result in 
better test scores?; (2) Is the amount of changes related to the 
score a person receives on the test?; (3) Is the pattern of changes 
related to the score a person receives on the test?; and (4) Does 
Item difficulty correlate with the probability of changing a given 
item? Subjects were 178 university students taking final exams. The 
tests were composed of true-false and multiple-choice items, and most 
students had more than adequate time to reconsider items. All 
subjects were made aware of the nature of the research. The results 
are presented as to the effects of response changes on test scores, 
relationship of amount of changes to score, relationship of pattern 
of changes to score, and relationship of item difficulty to response 
changing. The current research underscores a previous finding that 
students who change their responses raise their scores; however, 
reoeated changes on the same item did not improve their scores. The 
results also show that item difficulty is positively correlated with 
the probability of changing a given item, indicating that reasoned 
reconsideration of items was involved rather than chance misreadings 
of items. It also has bearing on amount and pattern of changes. (DB) 
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To Change or Not to Change Item Responses When Taking Tests: 

Empirical Evidence for Test Takers 

Most, if not all, college students enter a given course expecting or 
at least wanting to earn a commendable grade. To do this In most college 
courses, especially at the undergraduate level. It Is necessary to do well 
on the tests designed by the Instructor. Many of these tests are of the 
^’objective” kind, usually true-false and/or multiple choice. When the test 
Items are designed to be discriminating (probably most are), more than one 
response choice appears appealing. Consequently, t^e sS'udent In a high 
stress situation Is forced to choose between what seems to him at the 
moment to be equally appealing alternatives. 

Over the years students have often asked the authors the question, 
’’Should on© reconsider and possibly change a response to a test Item or 
should he go with his first Impulse?” After considering the question w© 
came to the realization that our reply has been based on our own beliefs 
and feelings which have emerged from our own test taking experiences. 

That Is, w© really did not have a rational and valid basis for making a 
reply. Our search through the literature has lad us to believe that other 
persons* responses to the same question have been on the same basis. 

Many authors give advice on this point. Morgan and Dees©, (1957, 
p. 77) airter telling the students to be sure to carefully look over the 
paper for errors state, ’’When you re-read your examination, you’ll probably 
be tempted to change soma of your answers. W© have some sound advice on 
this point. If you feel strongly that an answer should be changed, change 
It. On the other hand If you waver between two answers, not being able to 
make up your mind, don’t change the answer you set down originally. A lot 
of research on this point has shown that, when you are guessing, your first 
guess, based on a careful reading Is likely to b© your best guess. If you 
change your answers when you’re quit© unsure of yourself, the chances are 
that you’re doing the wrong thing. Remember, your first guess Is probably 
your best,” 

Oressel and Jensen (1955, p, 33) concur by saying, ’’Don’t change any 
of your answers unless you find you have made an obvious error.” Frederick 
(1938, pp. 345-346) says, ”Your first thought Is generally best. This Is a 
very good rule to follow In taking an objective test... If the student has 
time to think, he may forget the broader aspect wh!ch the teacher meant 
him to take and get mixed up by details.” Ehrlich (1961, p. 276) admonishes 
students not to panic late In the test. He says, ’’You take a real risk of 
ruining correct answers.” Armstrong (1956, p. 126) says, ”lf there Is any 
doubt, leave your first answer.” 

Huff (1961, pp. 1 13-1 15) and Honig (1967, p. 123) seem to recommend 
changing answers. They both tell the student to answer the easy Items 
(those he Is sure of) first and then consider the remainder or harder ones 
and come to a decision. That Is, answer the question. The Idea of dealing 
with the easy ones first Is to build confidence before considering the 
others. Honig says, ’’recheck all answers and never leave the exam before 
you have to.” 

It seems that the advice concerning the changing of answers on an 
objective test has been over-whelmingly against making a change. All of 
the authors advocate checking back over each Item but this seems to be to 
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Icjok for errors of omission or marking ore response when another was actually 
intended and the like, not to re-conslder and possibly come to a different 
conclusion. However, Huff and Honig do not appear to concur. They would 
change If a different conclusion seemed better. 

A check of student opinions showed that their opinions agreed well with 
‘.he writings mentioned. Cf more than 300 students In seven of the authors* 
classes Informally Interviewed, riore than three out of four Indicated by 
show of hands that they agreed that re-readlng tests and than changing 
their answers based on reconsideration of the Items Involved would tend to 
lower their test scores. 

With both students and written sources strongly favoring not changing 
responses to test Items, It would seem reasonable to expect some scientific 
grounds on which these opinions are based. However, literature dating 
back over 40 years Indicates Just the opposite. Lehman (1928), Lowe (1929), 
Matthews ( 1929), Berrein (1939), and Rellle and Briggs (1952) all report that 
changing answers (probably based on Item reconsiderations) tends to raise 
scores more than It lowers them. 

The purpose of the present report is to up-date and expand ‘the results 
of the earlier studies, particularly the Rellle and Briggs study. By 
reporting the responses of a different population of students and by using 
somewhat different research procedures, the present report should extend 
the applicability of the older results to today’s students. 

The main questions that were examined were; 

1. Does '•'he changing of responses to test Items (presumably based 
upon Item reconsideration) result In better test scores? 

2. Is the amount of changes related to the score a person receives 
on the test? 

3. Is the pattern of changes related to the score a person receives 
on the test? 

4. Does Item difficulty correlate with the probability of changing 
a given Item? 



PROCEDURE 

Subjects were 178 summer-session education students taking final exams 
In three classes at the University of Wisconsin at Oshkosh In August, 1971. 
The three classes were Child Growth and Development (N=22), Basic 
Educational Psychology (N=I00), and Educational Measurement and Evaluation 
(N=56). The tests were composed of true-false and multiple-choice Items. 
The tests had 61, 64, and 75 Items for the Child, Basic, and Measurement 
classes respectively. It can be concluded that most students had more than 
adequate time to reconsider Items because they handed their tests In before 
they were called for. 

Rellle and Briggs (1952) report a study very similar to the present 
one. The prime difference between their study and the present one Is that 
all subjects In the present study were made clearly aware of the nature of 
the research that would be done whereas Rellle and Briggs* subjects took a 
final exam unaware that their response-changing behavior would be studied. 
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Each subject In the present study was given his test, answer sheet, 
and a sheet entitled "Changed Response Record Sheet." On this sheet there 
were vertically arranged numbers corresponding to the numbers of the test 
items and three blanks beside each number on which to record changes of 
responses. In the event that a subject changed a response, he recorded the 

change on this "Changed Response Record Sheet" by placing his original 

response in the first blank by the Item and his second response In the 
second blank by the Item. The heading for the first column of blanks was 
"From"; for the second column was "To"; and for the third column was "To". 
This Indicated what a response was changed "from" and "to" In the event of 

one response change for an Item and "to" a second time in the event of a 

second change for the Item. The following directions were then given: 

I am doing research to find the effect on test grades of changing 
answer choices. If you do not change a response your data sheet will be 
blank except for your name and the time at which you hand in your answer 
sheet. If you change responses on your answer sheet, record your changes 
on the data sheet. An example of this would be John Smith who the first 
time he went through the test put down an "E" as the answer to Item 33. 

When he looked at 33 again, he decided "E" was wrong so he changed his 
answer to "8". He looked at 33 a third time and decided "C" was the correct 
answer. When he handed In his paper he had "C" as his answer on his answer 
sheet and E, B, C In the three blanks respectively on his data sheet. Are 
there any questions? 

It should be noted that the above Instructions are somewhat biased to 
favor reconsideration of the Items. 

RESULTS 

Effects of response changes on test scores . There were a total of 294 
response changes obtained from the 178 stu^nts. Fifty of the students did 
not change any responses. The maximum number of changed responses by one 
student was eight. Of the total of 294 changes, 21 responses obtained from 
16 students were changed two or more times. Of the 21, three were triply 
changed responses. On the first change, nine of the 21 were from wrong to 
right, nine were from right to wrong, and three were from wrong to wrong. 
Only eight of the twenty-one were right on the last response. The Three 
triply changed Items were all wrong on the final response. Because it 
would have little effect on the results. It was decided to Include the 21 
first changes (nine wrong-to- right, nine rlght-to-wrong and three wrong-to- 
wrong) in the overall analysis of results. 

Of the 294 response changes, 166 were from wrong-to-rlght, 79 were 
from right-to-v/rong, and 49 were from wrong-to-wrong. If we Ignore the 49 
wrong- to-wrong changes because they do not affect a student’s final score, 
on any of the remaining 235 responses odds were .68 that the student was 
Improving his test score and .32 that he was lowering It. If .68 Is a 
representative random sample proportion, then odds are about 99 In 100 that 
the true proportion of wrong-to-rlght changes Is somewhere between .61 and 
.75 when these response changes are the result of the ordinary deliberation 
used In taking tests. The .68 proportion agrees well with the Rel I le and 
Briggs (1952) data reported In Table I In their study. They found 476 
wrorig-to-rlght and 224 rlght-to-wrong changes which gives accurate to two 
decimal places In the same proportion as found hi the present study and 
which the authors of the present study presume Is just an Interesting 
coincidence. 
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Relationship of amount of changes to score . Because the dtfferent 
tests were not directly comparable, only the data from the 100 students In 
the Basic Educational Psychology class was used to find whether or not the 
amount of changes was related to the score a person received on the test. 

The observed correlation (r* -.15) indicated a non-signi f leant tendency for 
people with higher scores to have less changed responses. Since It Is 
possible that a sizeable number of both high and low scoring students did 
not think It relevant to reexamine test Items, the data of the 28 students 
with zero response changes was discarded and the data on the remaining 72 
was analyzed. In this case, the numbers of response changes were found to 
be significantly related with test scores <r= -.27, P<.025). 

Relationship of pattern of changes to score . The response tendencies 
of the top and bottom 27% of the combined classes were compared to see If 
the pattern of changes for the high scorers was significantly different from 
that of low scorers. There was a nonsignificant tendency (x^ (df=2) = 3.39; 
p<.20) for low scorers to do more poorly than high scorers when changing 
responses. The high scorers had proportionately somewhat more wrong-to-rlght 
changes but even the low scorers had more wrong-to-rlght changes than 
right-to-wrong. 

Relationship of Item difficulty to response changing . For the analysis 
relating to Item difficulty, only the data from the 100 Basic Educational 
Psychology students was used. There was a low but significant correlation 
between Item difficulty and the number of people who changed their responses 
to that Item (r=.25; p<.025). This effect means that the more difficult the 
Item was, the more likely people would change their responses to that Item 
but only to a slight degree. 

DISCUSSION AND CONCLUSIONS 

Perusal of the leading educational and psychological testing texts 
reveals no mention of studies dealing with strategies of taking tests; yet, 
research has been available for over forty years Indicating that 
reconsidering test Items tends to raise scores. The current research under- 
scores the validity of this finding that students who change their responses 
raise their scores. On the other hand# repeated changes on the same Item 
did not help the 16 students whe tried It In this study. Apparently for 
those Items either the students were too confused about the Items to make 
anything beyond chance Improvement or decrement. 

Probably the last three results analyzed should be discussed together. 
All are probably affected by severely attenuated data (too many zero scores 
In terms of number of changes and maximum number of changed responses per 
student only eight). In the research of Rellle and Briggs (1952) the test 
was 130 Items whereas the longest test In the current study was 64 items. 

The mean number of changes In the Rellle and Briggs study was 7.8 whereas 
In the current study the mean number was 1.65. It is therefore not 
surprising that the results of this study with respect to the relationships 
of amount and pattern of changes to scores are In the same direction as 
earlier results but are not as strong. 

The fact that Item difficulty Is positively correlated with the 
probability of changing a given Item has great bearing on all results of 
this study. It Indicates that It was In fact reasoned reconsideration of 
Items that was Involved rather than simply chance misreadings of Items. It 
also has bearing op amount and pattern of changes. Presumably more 
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knowledgeabie students are In the position that on an average all Items 
are easier for them than they are for less knowledgeable students. This 
In turn explains both the relationship of amount of changes and the 
relationship of pattern of changes to test scores. People who knew irore 
had less changes because they were more confident of their knowledge. 

People who knew more tended to be correct upon changing responses 
proportionately more often than those who knew less, again reflecting 
more confidence In their knowledge. However, It should be reemphasized 
that even the low scorers helped their sco*"es more than they hindered 
them by reconsidering Items. 

The authors would like at this point to submit a cognitive hypothesis 
on the reason It Is helpful to reconsider test Items. Jarrett (1948) 
discounts "subliminal response tendencies.” Presumably students taking 
exams have their memories jogged by other Items or other reminiscences such 
that upon reconsideration of an Item after having done other items they are 
more likely to be able to reason out the correct answer. It might be added 
that many times stems do give Information that might be useful In answering 
other questions. What Is being said then Is that students do think and 
that optimum utilization of thinking processes during testing Involves 
reconsideration of Items. If much of reasoning Involves Internal responses 
which can be above or below thteshold, then Jarrett^s original hypothesis 
might be right. 

Based on the ”!»ow-to-study” authors cited, the Informal survey of 300 
students, and the results of the current and previous studies, two main 
conclusions seem justified. First, for those students who do not go over 
the exam, both the reliability and validity of the exam Is being lowered. 
Second, many students have been led astray by professors and peers telling 
them to stick to their first choices. 
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